Skip to content

Conversation

@yuankaichen-amd
Copy link
Contributor

This cli uses *-pretrain.yaml as input and projects memory usage on a worker (default rank=0). It also prints a breakdown of parameter numbers and activation memory usage for each submodule from the model.

Currently it only supports Megatron config files.

Example usage:

NNODES=96 PRIMUS_MODEL=deepseek_proxy_2T PRIMUS_EP=16 PRIMUS_PP=12 bash runner/primus-cli direct --no-gpu --single -- projection memory --config examples/megatron/configs/MI300X/deepseek_v2-pretrain.yaml

Example output:

Total Number of Parameters: 2049.574961 Billion (2,049,574,961,152.0)

[embedding]
Params: 0.822084 Billion (822,083,584)
Activation Memory: 0.0625 GB

[dense_transformer_layer]
Params: 0.453018 Billion (453,017,600.0)
Activation Memory: 0.6250 GB
...........
...........
...........

[Primus:Projection] Memory Projection Summary on Rank 0:
Params: 13.006799 Billion (13,006,798,848.0)
Param+Optimizer Memory: 78.7379 GB
Activation Memory (per batch size 1, seq len 4096): 395.3125 GB
Projected Total Memory: 474.0504 GB

Follow-up work:

(1) consolidate parameters in projection config;
(2) Torchtitan support.

total += num_tokens * self.config.model_config.ffn_hidden_size * 2 # bf16
# Second Gemm
total += num_tokens * self.config.model_config.ffn_hidden_size * 2 # bf16
return total
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-swiglu and tensor model parallelism is not taken into account here.

@Xiaoming-AMD Xiaoming-AMD merged commit 0d567b8 into main Nov 12, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants